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Abstract. To address the uncertainty in outputs of numerical weather pre¬ 
diction (NWP) models, ensembles of forecasts are used. To obtain such an 
ensemble of forecasts the NWP model is run multiple times, each time with 
different formulations and/or initial or boundary conditions. To correct for 
possible biases and dispersion errors in the ensemble, statistical postprocessing 
models are frequently employed. These statistical models yield full predictive 
probability distributions for a weather quantity of interest and thus allow for 
a more accurate assessment of forecast uncertainty. This paper proposes to 
combine the state of the art Ensemble Model Output Statistics (EMOS) with 
an ensemble that is adjusted by an AR process fitted to the respective error 
series by a spread-adjusted linear pool (SLP) in case of temperature forecasts. 
The basic ensemble modification technique we introduce may be used to sim¬ 
ply adjust the ensemble itself as well as to obtain a full predictive distribution 
for the weather quantity. As demonstrated for temperature forecasts of the 
European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble, 
the proposed procedure gives rise to improved results upon the basic (local) 
EMOS method. 


1. Introduction 

Weather forecasting is usually based on outputs of numerical weather predic¬ 
tion (NWP) models that represent the dynamical and physical behaviour of the 
atmosphere. Based on actual weather conditions, the equations are employed to 
extrapolate the state of the atmosphere, but may strongly depend on initial con¬ 
ditions and other uncertainties of the NWP model. A methodology that accounts 
for such shortcomings is the use of ensemble forecasts by running the NWP model 
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several times with different initial conditions and/or model formulations (Gneiting 


et al. 

2005 

Leutbecher and Palmer 

OC 

o 

o 

CM 


Forecast ensembles play an important role when it desired to develop meth¬ 
ods that transfer from deterministic to probabilistic forecasting, since information 
about forecast mean and variance may be extracted from several individual fore¬ 
casts (Palmer, [2002). In practise, however, ensemble prediction systems are not 
able to capture all sources of uncertainty, thus they often exhibit dispersion er¬ 
rors (underdispersion) and biases. Statistical postprocessing models correct the 
forecasts in accordance with recent forecast errors and observations and yield full 


predictive probability distributions (see e.g. 

Gneiting and Raftery 

2005 

Wilks and 

Hamill 

2007 Gneiting and Katzfuss 

2014 



There are two widely used approaches in statistical postprocessing that yield 
full predictive distributions based on a forecast ensemble and verifying observa¬ 
tions. In the Bayesian model averaging (BMA, Raftery et al. 2005 ) approach each 
ensemble member is associated with a kernel density (after suitable bias correction) 
and the individual densities are combined in a mixture distribution with weights 
that express the skill of the individual ensemble members. Ensemble model out¬ 
put statistics (EMOS, |Gneiting et al. 20051 combines the ensemble members in a 
multiple linear regression approach with a single predictive distribution. |Gneiting| 
et al. (20051 and Raftery et al. (20051 apply the postprocessing methods to weather 


quantities, where a normal distribution can be assumed as underlying model, such 
as temperature and pressure. For the application to other weather quantities alter¬ 
native distributions are required. An overview on existing variants of EMOS and 


(2014). 


BMA can for example be found in Schefzik et al. (2013) and Gneiting and Katzfuss 


In cases where various probabilistic forecasts from different sources are avail¬ 
able, combining these forecasts can improve the predictive ability further. The 
individual forecasts might for example (as in our application) come from different 
competing statistical postprocessing models. The most widely used method to com¬ 
bine the individual predictive distributions is the linear pool (LP), see for example 
Gneiting and Ranjan (2013), Ranjan and Gneiting (2010) and references therein 


for reviews on the topic. Gneiting and Ranjan (2013) show that the linear pool re¬ 


sults in an overdispersed predictive distribution, regardless whether the individual 
components are calibrated or not. A more flexible and non-linear approach is the 


spread-adjusted linear pool (SLP). For example, Berrocal et al. (2007) and Kleiber 


et al. (2011) empirically observed overdispersion of the linear combined forecasts 


in case of approximately neutrally dispersed Gaussian components and they intro¬ 
duced a non-linear spread-adjusted combination approach, that was generalized and 


discussed by Gneiting and Ranjan (2013). Further, Gneiting and Ranjan (2013) 


and Ranjan and Gneiting (2010) propose another flexible non-linear aggregation 


method, the Beta-transformed linear pool (BLP), resulting in highly improved dis¬ 
persion properties. 

Although weather prediction implies temporal structures and dependencies, we 
were unable to find explicit applications of time series methods in the context of 
ensemble postprocessing methods. On the contrary, there are several approaches 
to apply time series models directly on observations of a weather quantity or other 
environmental and climatological quantities. To name only a few, Brown et al. 


(1984) apply an AR(p) model to transformed wind speed data in order to simulate 
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and predict wind speed and wind power observations. Katz and Skaggs (1981) 


investigate the problems when fitting ARMA models to meteorological time series 
and, as an example, present an application to time series of the Palmer Drought 
index. Al-Awadhi and Jolliffe (1998) fit specific ARMA models to surface pressure 


data. Milionis and Davies (1994) apply the Box-Jenkins modelling technique to 
monthly activity of temperature inversions. 

We propose to apply time series models in the context of ensemble postprocess¬ 
ing. To account for an autoregressive (AR) structure in time we construct a pre¬ 
dictive distribution based on an AR-adjusted forecast ensemble (local AR-EMOS). 
As the standard local EMOS predictive distribution shows signs of underdispersion 
and our local AR-EMOS distribution on the contrary clearly exhibits overdisper¬ 
sion we propose to combine AR-EMOS and EMOS with a spread-adjusted linear 
pool. 

The structure of the paper is the following: Section [2] presents our proposed AR 
modification of the forecast ensemble and briefly reviews the EMOS model and 
the spread-adjusted linear combination of probabilistic forecasts. Sections [3] and 
[4] illustrate some possible applications of our basic AR-modification method in a 
case study with temperature forecasts of the European Center for Medium Range 
Weather Forecasts (ECMWF). We end the paper with some concluding remarks 
and a discussion of further extensions of our method. 


2. Methods 


2.1. Ensemble model output statistics. Gneiting et al. (20051 introduced the 
ensemble model output statistics (EMOS) model for the case, that a normal distri¬ 
bution can be assumed as model for the weather quantity. In all following sections 
we use a notation depending on the time index t , to stress the fact that we explicitly 
model temporal dependencies through our autoregressive adjustment approach. 

The predictive distribution is obtained by fitting the following multiple linear 
regression model to the observation y(t) of the weather quantity Y(t) and the 
forecast ensemble {A'i(t),... ,X m (t)}: 


( 1 ) 


Y(t) — a + &iA'i(f) + ... + b rn X rn {t) + s(t), 


where a, 61 ,..., b m ft are real valued regression coefficients that can be interpreted 
as bias-correction coefficients, and e{t) is a normally distributed error term with 
e(t) ~ A/"(0, er 2 (t)). For convenience we define £(t) = a + b±Xi(t) + ... + b m X m (t), 
yielding the representation 

( 2 ) Y(t) = m+m 


of the EMOS model. 

In case of an exchangeable ensemble, the multiplicative bias correction parame¬ 
ters are chosen to be equal, that is 6* = b, for i = 1,..., m. In this case, the EMOS 
model is given as 

(3) Y(t) = a + bX(t) + e(t), 

where X(t) = Y^lLi Xi(t)/m and we can define £(f) = a + bX{t). 

The variance of e(t) is parameterized as linear function of the ensemble variance 
to account for dispersion errors in the raw ensemble: 

(4) Var(e(t)) = a 2 (t) = c + dS 2 (t), 
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Figure 1. The 383 observation stations in Germany used for the 
analysis, where the filled circle indicates station Frankfurt a.M. 


where S 2 (t) = — X(t )) 2 /(to — 1) and c, d € R + are nonnegative coeffi¬ 

cients. 

Models 0 and 0 both yield the following predictive EMOS distribution: 

(5) Y{t)\Xi{t ),..., X m (t) ~ Affat), a 2 (t )). 


The parameters of the distribution are estimated from a training period preceding 
the forecast time and re-estimated for each forecast time using a rolling training 
period of fixed length. Gneiting et al. (2005) propose to estimate the parameters 
by optimizing the continuous ranked probability score (CRPS, Wilks 20111 over 
the training data. This estimation procedure is implemented in the R package 


ensembleMOS (R Core Team 2015 Yuen et al. 2013). 


Two variants of estimating the EMOS parameters exist, the global and the local 
EMOS approach. For global EMOS only one set of parameters is estimated, while 
for global EMOS a separate set is estimated at each station. 

For our case study we consider a station-wise approach and therefore employ the 
local EMOS variant. 


2.2. The AR modification technique. Let again {Xi(f),..., X m (t)} denote an 
ensemble of forecasts for a univariate (normally distributed) weather quantity Y ( t .) 
at a fixed location. Let r/(t) denote a deterministic-style forecast of Y(t) with 
corresponding forecast error 


( 6 ) 


Z(t) :=Y(t)-rj{t) . 
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Simple examples for 77 (f) are provided by any ensemble member Xi(t) itself, the 
raw ensemble mean X(t) = Xi(t)/m, or the raw ensemble median denoted 

by X. 

If 77 (f) is a one-step-ahead forecast made at origin t — 1, then 77 (f) may be seen 
as appropriate, if the time series {Z(t)} can be viewed as a white noise process. If 
there is some indication that this property is violated, one may readily assume that 
the series { Z(t )} follows a weakly stationary AR(p) process, i.e. 

v 

(7) Z(t) - n = ^2,a j [Z(t- 3 ) - n\+e(t) , 

3 =1 

where {er t } is white noise. Combing § and 0 gives Y(t) = r](t) + e(t), where 

v 

( 8 ) 77 (f) = 77 (f) + /U + ^2 otj [Y (f - j) - T)(t - j) - p\ 

3 =1 


can be seen as an AR modified forecast based on the actual forecast 77 (f) and past 
values Y(t — j) and r](t — j), j = 1,... ,p. The coefficients n, op, ..., a p may be 
obtained by fitting an AR(p) process to the observed error series {Z(t)} from a 
training period, where the order p of the process can automatically be chosen by 
applying an appropriate criterion. This includes the incidence p = 0, in which case 
77 (f) is a simple bias correction of 77 (f). For the actual fitting we employ Yule-Walker 


estimation as carried out by the R function ar, see also Shumway and Staffer (2006 


Section 3.6). Order selection is done by the AIC criterion, despite the circumstance 
that the estimated coefficients are not the maximum likelihood estimates. 

The described approach differs from the times series methods introduced e.g. by 


Brown ef al. ( 1984) in the sense that it does not aim at directly modelling a weather 


quantity itself. 

The basic AR modification method can be employed for different purposes in 
ensemble postprocessing, depending on the need of the user. For example it can 
simply be used to obtain an AR-modified raw ensemble, or to build a postprocessed 
predictive distribution based on the modified ensemble. The following sections 
illustrate some possible applications of AR modified ensemble forecasts by means 
of a given data set, where we analyze the aggregated predictive performance and 
the performance at a single station. An AR-EMOS predictive distribution for 
temperature is introduced and shown to improve upon the well-established local 
EMOS method with respect to certain verification scores. 

See e.g. Wilks (2011) as a reference for applied statistical methods for atmo¬ 


spheric sciences and Gneiting and Katzfuss (2014) for a comprehensive review of 
probabilistic forecasting. 


2.3. Spread-adjusted linear combination of predictive distributions. Let 

7?^ denote the class of non-random cumulative distribution functions (CDFs) that 
admit a Lebesque density, have support on R and are strictly increasing on R. 
Further, F- t ,... ,F k e T >^ denote the considered CDFs with Lebesgue densities 
/ 1 , • ■ • j fki where k is an arbitrary but finite integer value. 
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Station Ruppertsecken 




Lag 


Figure 2. Series of temperature and ensemble mean (upper 
panel), series of forecast errors (middle panel), and ACF of series 
of forecast errors (lower panel) for station Ruppertsecken 


Then the spread-adjusted linear pool (SLP) combined predictive distribution 
with spread-parameter c has CDF (see Gneiting and Ranjan 2013) 


( 9 ) G c (y) = J2 w iF?( y ~^ L ) 


and density 


( 10 ) 


9c (v) = ^E 

1=1 


where w\, ..., Wk are non-negative weights with 2^=1 w; = 1, c > 0 is the spread- 
adjustment parameter and /q is the unique median of Fj. Further Fj 3 and / ; ° are 
defined via the relationship Fi(y) = F®(y — m) and fi(y) = /;°(y — m), respectively. 
Gneiting and Ranjan (|2013) note that for neutrally dispersed or overdispersed 


components, a value of c < 1 may be appropriate, while for underdispersed com¬ 
ponents, a value c > 1 is suggested. A value of c = 1 corresponds to the standard 
linear pool. 

Typically a common spread parameter is used for all components, although the 
method can be generalized to have spread parameters varying with the components. 
However, this may be appropriate only in case the dispersion properties of the 
components differ to a high extent. 
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3. Application to European Center for Medium Range Weather 
Forecasts of temperature 

The data set for our case study comprises an ensemble with m = 50 members 
(and one control forecast not used here) of the European Centre for Medium-Range 
Weather Forecasts (ECMWF, see e.g. Molteni et al. 1996). Initialized at 00 UTC, 
the forecasts are issued on a grid with 31 km resolution, and they are available for 
forecast horizons in 3 hour steps up to 144 hours. In Germany 00 UTC corresponds 
to lam local time, and to 2am local time during the daylight saving period. For our 
analysis we consider 24-h ahead forecasts for 2-m surface temperature in Germany 
along with the verifying observations at different stations in the time period ranging 
from 2010-02-02 to 2011-04-30. Although there is a total of 518 stations in the full 
data set, only 383 stations with complete T = 453 observations for the variables 
were retained. To use the forecasts in combination with globally distributed surface 
synoptic observations (SYNOP) data, they are bilinearly interpolated from the 
four surrounding grid points to the locations that correspond to actual observation 
stations. The observation data from stations in Germany were provided by the 
German Weather Service (DWD). Figure |T] shows the locations of the 383 stations 
within Germany, where the station Frankfurt a.M. is marked as a filled circle. 


3.1. Forecast error of ensemble mean. For each of the 383 stations we com¬ 
pute the series Z(t) = Y(t) — X(t) of forecast errors of the ensemble mean, where 
t ranges over the whole time period. To check for independence, we apply the 
Ljung-Box test (Ljung and Box} 1978) based on lag 1, which is available in R as 
function Box.test. All 383 computed p-values are not greater than 0.046 (the 
largest occurring value), indicating substantial autocorrelation in the forecast error 
series for each station. Figure [2] further illustrates this point by showing the series 
of temperature observations together with the ensemble mean, the corresponding 
forecast errors, and the autocorrelation function (ACF) of the series of forecast 
errors for the randomly chosen station Ruppertsecken in Rhineland-Palatinate. 

To account for autocorrelation in a series of forecast errors, the following sub¬ 
sections present an approach to obtain a predictive distribution based on the AR 
modification method (AR-EMOS), while Section [4] discusses AR-EMOS for the sta¬ 
tion Frankfurt a.M. with regard to a much longer than before verification period of 
3650 days. 


3.2. Length of training period for fitting the AR model. When the mean 
X(t) of the raw ensemble is considered as a deterministic-style forecast 77 (f), a 

corresponding AR modified forecast 77 (f) = X (t ) can be generated according to the 
procedure described in Section [2~2j For computing the order and the coefficients 
of the AR(p) process, a training period of Ti days previous to the forecast day is 
considered. For each day out of the T% = T — T\ remaining days from the forecast 
(verification) period, the AR coefficients are newly estimated, where the training 
period is shifted accordingly (rolling training period). 

When choosing an appropriate length of training period, there is usually a trade¬ 
off. Longer training periods yield more stable estimates, but may fail to account 
for temporal changes, while shorter periods can adapt more successfully to changes 
over time. To learn about an appropriate training length, a variety of possible 
values, namely T) = 30,60,90,120,150,180,210, were considered in a preliminary 
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Table 1. MAEs for T2 = 363 days averaged over 383 stations. 


v(t) 

m 

X(t) 

X(t) 

X(t) 

X(t) 

X(t) 

MAE 

2.911 

2.024 

2.007 

2.918 

2.037 

2.018 


analysis, and for each station the mean absolute error 

( 11 ) 

2 t=1 

of the AR modified forecast rj(t ) = X(t) has been computed. As it turned out, a 
training length of Tf = 90 days provided a favorable performance with respect to 
the MAE averaged over the considered stations. This number of days will be used 
as training length for fitting the AR process in all further analyses. 

3.3. AR modified ensemble. In a second step the procedure from the previous 
subsection is applied directly to each ensemble member. For each station, this 
procedure generates an AR modified ensemble for the forecast period of length T 2 . 
Table [T] shows a comparison of MAEs corresponding to certain deterministic-style 
forecasts. Here, X(t) is the AR modification of the raw ensemble mean, while 

X(t) is the mean of the AR modified ensemble members. For the dot superscript 
denoting the median, the two definitions are analog. 

For the mean as well as the median, the approach of first applying the AR modi¬ 
fication to each member of the raw ensemble and then computing the deterministic 
style forecast from the modified ensemble yields smaller MAE values than the re¬ 
verse procedure of applying the AR modification directly to the deterministic style 
forecast computed from the original ensemble. As it turns out, the mean of the 
AR modified ensemble X ( t ) performs best among the considered deterministic-style 
forecasts with respect to MAE and will therefore be the basis for constructing a 
predictive probability distribution in the following. 


3.4. Predictive distribution. The above may be seen as a preliminary to the 
development of a statistical postprocessing method yielding a full predictive distri¬ 
bution, based on the AR modification. The EMOS procedure described by[Gneiting 
et al. (2005) and shortly reviewed in Section 2.1 assumes a Gaussian predictive dis¬ 


tribution 

( 12 ) 




for a weather quantity Y(t), given the ensemble forecasts (Xi(t),..., X m (t)). Here, 
£(f) is a linear combination of the ensemble members and er 2 (f) is a linear function of 
the ensemble variance, as described in Equations 0, @ (exchangeable ensemble 
version) and ©• The coefficients are obtained station-wise (local EMOS) by a 
minimum-CRPS estimation procedure. 

To obtain a predictive distribution based on our AR modification (AR-EMOS) we 
employ a plug-in strategy replacing £(t) by the mean of the AR modified ensemble 


X(t) from step 2 outlined in Section 3.3 When the deterministic-style forecast ?y(f) 
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Figure 3. Univariate verification rank histogram and PIT his¬ 
tograms over 383 stations and all dates in the verification period. 


in © is considered as non-random, 


(13) 


Var = Var (Z(t)) = a 2 e [ 1 + 


3 = 1 


where er 2 is the variance of e(t) and the 'ipj are the coefficients of the one-sided 
linear process (moving average) representation of Z(t ), 

(2006 Section 3.2). 


cp. 


Shumway and Stoffer 


We obtain the ipj coefficients by replacing the unknown AR coefficients by their 
estimates and solve a homogenous difference equation, see. e.g. |Shumway and 
Stoffer (2006] Example 3.10). In R this can be done by the function ARMAtoMA. By 
replacing crj by its estimate and using only the first ten ^-weights, we compute 
an estimate of Var (Y(t)) for every ensemble member 77 (f) = W(t). The Gaussian 
parameter er 2 (t) is then estimated as the average variance estimate. 

The local EMOS procedure itself requires an additional training period. Prelim¬ 
inary investigations suggested that a length of 25 days is optimal for the data set 
of our case study. The training period for fitting the AR process together with the 
training period for fitting the EMOS model yields a forecast period of T 2 = 338 
days (ranging from 2010-05-28 to 2011-04-30) for verification purposes. 
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Table 2. Verification statistics averaged over Ti = 338 days and 
383 stations. 



EMOS 

AR-EMOS 

SLP 

MAE 

2.042 

2.036 

1.969 

CRPS 

1.471 

1.460 

1.407 

DSS 

3.135 

2.908 

2.821 


The MAE of X(t) averaged over the stations in this final verification period is 
2.036, and thus slightly smaller than the corresponding averaged MAE 2.042 of 
the local EMOS predictive mean, see Table [2] By considering the station averaged 
CRPS, the local EMOS predictive distribution admits a value of 1.471 while the 
CRPS of the local AR-EMOS distribution shows a slightly better value of 1.460. 
The DSS score (see Section 3.5) of local AR-EMOS is smaller than the value of 
local EMOS as well. A comparison of the rank histogram for the raw ensemble, 
and probability integral transform (PIT) histograms for the local EMOS method 
and the local AR-EMOS displayed in Figure[3]indicates a good calibration of the AR 
modification. However, while local EMOS still shows signs of underdispersion, the 
small hump shape of local AR-EMOS reveals a slight tendency to the reverse effect 
(overdispersion). These contradictory dispersion properties suggest a combination 
of both predictive distributions. The combination with a spread-adjusted linear 
pool is discussed in the next section and reveals an improvement in calibration 
compared the individual predictive distributions. 


3.5. Combination of predictive distributions. In this subsection we introduce 
a spread-adjusted combination of the local EMOS and the local AR-EMOS pre¬ 
dictive distributions obtained in the previous section. We perform the aggregation 
of the distributions in line with the spread-adjusted linear pool (SLP) formula of 

, that we shortly reviewed in Section [2.3| 

We define the SLP combined cumulative distribution function according to Equa¬ 
tion <© with k = 2 in our case. Now, let <f> and $ denote the probability density 
function (PDF) and the cumulative distribution function (CDF) of the standard 
normal distribution, respectively. If the two component CDFs we wish to combine 
are normal, then Ff(y) = &(y/ai), l = 1,2, in Eq. and the SLP combined 
predictive CDF is 


Gneiting and Ranjan (2013 


(14) 


F(y) =w 1 G 1 (y) + w 2 G 2 (y), Gi(y) = $ 


y~(M 

(JlC 


l = 1,2, where w\ is nonnegative, W 2 = 1 — w\, and c is a strictly positive spread 
adjustment parameter (Gneiting and Ranjan 2013). The choice c= 1 corresponds 
to the traditional linear pool (TLP). The expectation y,p and the variance <j‘ 2 f of 
the CDF F are 


(15) 


2 

hf = Yw m 

i=i 


and 


2 

2 / 2 , 2 2 \ 2 
a F — 2_^ + c ) — [i F , 

l=i 


(16) 
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allowing for straightforward computation of the Dawid and Sebastiani ([1999) score 
(17) 


DSS (F,y ohs ) = ( Vohs / f) ~ +2log a F , 


see also Gneiting and Katzfuss (2014). 


From Equation (5) in Grimit et al. (2006), the CRPS 
(18) 


/ OO 

{F(y) - i (y > yobs )} 2 d y , 

-oo 


where F is of the form (|14j), can also be written as 

2 

CRPS {F,y ohs ) = y ^wiA(y obs - p.i,c 2 af) 


(19) 


l=i 
2 2 


-,EE wiw k A(m - p k , c 2 (af + al)) 


1 = 1 k = 1 


where 

( 20 ) 


A(n, a 2 ) = 2 a(j) + p ( 2 $ 0) - l) 

is the expectation of the absolute value of a A f(p, a 2 ) distributed random vari¬ 
able. Since there exist approximations of $ up to an arbitrary precision, see e.g. 


Abramowitz and Stegun (19721, Formula (19) is easy to evaluate. 


3.5.1. Choice of appropriate weights. To gain some insight about an appropriate 
choice of the SLP parameters when combining local EMOS and local AR-EMOS, 
we investigate a grid of combinations of values for w± (1V2 is fully determined by w 1 ) 
and c. More precisely, we obtain the SLP combined predictive distribution for all 99 
combinations w\ = 0.0, 0.1, ..., 0.9, 1.0 (w? = 1 — w\) and c = 0.6, 0.7,... 1.3,1.4. 

For each of the above mentioned combinations, the average DSS and CRPS over 
X 2 = 338 days and 383 stations is computed. As it turns out, the minimal average 
DSS and CRPS values both occur for the simple unfocused combination w\ = W 2 = 
0.5 and c = 1. As it seems, within the unfocused SLP combination the contradictory 
dispersion properties of local EMOS and local AR-EMOS mutually compensate, 
yielding a predictive distribution with improved calibration. To this effect, the 
accordingly combined predictive CDF is favourable upon both local EMOS and local 
AR-EMOS with respect to the DSS and the CRPS, and performs better with respect 
to the MAE as well, see Table [2] Figure [3] shows the verification rank histogram 
of the raw ensemble and the PIT histogram of the three considered postprocessing 
methods. The rank of the raw ensemble (top right panel) exhibits a very pronounced 
U-shape indicating a strong underdispersion and a need for postprocessing. The 
PIT histogram of the SLP combination, displayed in the bottom right panel of 
Figure [3j shows highly improved dispersion properties. While local EMOS (top 
right panel) clearly exhibits a U-shape and local AR-EMOS (bottom left panel) a 
slight hump-shape, the SLP combination is corrected for both types of dispersion 
errors, resulting in PIT histogram that is close to uniformity. 

The improved dispersion properties visible in the PIT histogram are further 
supported by Table [3] showing the variance of the PIT values and the root of 
the mean variance of the respective postprocessing methods. A similar table was 


considered in Gneiting and Ranjan (2013) to investigate the proposed types of 
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Table 3. PIT variance (dispersion) and density forecast root 
mean variance (sharpness) for 383 stations and 338 verfications 
days. 



Var(PIT) 

RMV 

Local EMOS 

0.101 

2.30 

Local AR-EMOS 

0.079 

2.73 

SLP Combination (uq = 0.5, c = 1.0) 

0.083 

2.62 


combinations of predictive distributions. The variance of the PIT values provides 
further information on the dispersion properties of the distribution, a variance equal 
to Yg = 0.0833 corresponds to the variance of the uniform distribution on [0,1], 


thus indicating neutral dispersion (Gneiting and Ranjan 2013). The root mean 
variance is used as a sharpness measure of the predictive distributions. A main 
principle of probabilistic forecasting is “maximizing the sharpness of the predictive 
distribution subject to calibration” (see e.g. Gneiting and Katzfuss 2014), therefore 
the sharpness should be investigated in conjunction with the calibration. When 
looking at Table [3] we can see that the PIT values of local EMOS have a variance 
> A, indicating underdispersion, while for the PIT values of local AR-EMOS we 


have a variance slightly smaller than 12 , indicating overdispersion (Gneiting and 
Ranjan 2013). The variance of the PIT values of our derived SLP combination is 


virtually identical to -pj. These results are in line with the PIT histograms in Figure 
[3] When investigating the sharpness in terms of the root mean variance, we see 
that the local EMOS predictive distribution is the sharpest, while local AR-EMOS 
the least sharpest one. The sharpness of our SLP combination lies between the 
other two. Although local EMOS exhibits the highest sharpness, it strongly lacks 
calibration, leading to a deterioration in predictive performance. On the contrary, 
local AR-EMOS shows a decreased sharpness, but is uncalibrated as well. In view of 
the sharpness principle mentioned above, our SLP combination is able to combine 
improvement in sharpness and calibration in comparison to both, local EMOS and 
local AR-EMOS. 


3.5.2. Testing for forecast accuracy. The improvement in the verification scores of 
our derived SLP combination over the local EMOS method may also be investigated 
for significance by testing for equal predictive performance of the two methods with 
the Diebold-Mariano test for time series, see Gneiting and Katzfuss (2014). 

, 2, denote the series of mean CRPS values (averaged over all 383 


Let g[ J \ j = 1, 


stations) of local EMOS and our unfocused SLP combination derived in Section 
ra for the verification period of length T 2 = 338, respectively. Then the large- 
sample standard normal test statistic adapted from Diebold and Mariano (1995) 


is 


( 21 ) 


S=s/T 2 




where 
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Raw Ensemble Local EMOS 



1 11 21 31 41 51 0.0 0.2 0.4 0.6 0.8 1.0 

Rank PIT 

Local AR-EMOS SLP Combination 



0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

PIT PIT 

Figure 4. Univariate verification rank histogram and PIT his¬ 
tograms of local EMOS, local AR-EMOS, and SLP combination 
(with w\ = W 2 = 0.5, c = 0.9) over 3650 days for station Frankfurt 
a.M. 

is the average CRPS differential and 

( 23 ) %{t) = y 5Z ( d t-d)(d t _ ]T \-d) 

2 t=M+i 

are the empirical autocovariances. 

Usually the truncation lag h— 1 refers to h -step ahead forecast errors, suggesting 
the choice h = 1 here. 

When comparing the differential series of local EMOS and the combined distri¬ 
bution for the case h = 1, yields S = 7.0621, which, compared to the standard 
normal distribution, gives a clear indication of nonequal predictive performance of 
the two methods in question and therefore of improvement of our above derived 
SLP combination upon local EMOS. Higher choices of h yielded different values of 
the test statistic S, all of which were found as highly significant. 

4. Application to ECMWF data at a single station 

The above verifying results are obtained by averaging over all stations used for 
the case study. Since the discussed methods refer to a stations-wise approach, it 
should also be of interest to investigate their performance with respect to a single 
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Table 4. Order p of autoregressive fit for 3650 forecast days at 
Frankfurt a.M. 


V 

0 

1 

2 

3 

4 

5. ..15 

Freq. 

1314 

1443 

391 

130 

197 

175 


station. For this analysis we use the 50 member ECMWF forecast ensemble and 
SYNOP observations as well. However, we employ a data base containing about 
10 years and ranging from dates in 2002 up to 2012, which is part of a larger data 
set investigated in Hemri et al. ( 2014| ). To this end, the station Frankfurt a.M. 
in the south of Germany is exemplarily observed during a verification period from 
2002-04-27 to 2012-04-23 (3650 days). 


4.1. Investigation of autoregressive fit. For illustrative purposes, the AR mod¬ 
ification introduced in Section |2.2| is applied to the raw ensemble mean of each 
forecast day based on a training period of Ti = 90 previous days. Table [4] displays 
the absolute frequencies of the order p chosen by the minimal AIC criterion for 
an autoregressive fit to the error series. Although 1314 out of 3650 of these series 
used for prediction do reveal no substantial autocorrelation, thus yielding a simple 
AR(0) fit, the majority of cases exhibits an autoregressive structure that needs to 
be accounted for. The by far most prominent type of autoregressive structure is the 
AR(1) process, fitted to the error series in 1443 cases. Even higher orders of p = 2, 
p = 3 or p = 4 appear quite frequently. Only for orders of p = 5 or higher, the 
number of cases is decreasing. This observation is in line with other applications of 
AR processes (in econometric as well as in environmental/meteorological applica¬ 
tions like those briefly discussed in Section [l]), most of the dependence structures 
present in data can be covered by autoregressive processes of low orders (e.g. p = 1 
or p = 2). 

The analysis of the orders p fit to the error series reveals the usefulness and 
flexibility of our proposed method. It is able to adapt to the type of autoregressive 
dependence of the errors. In cases where there is a substantial autocorrelation 
present in the series, the AR-modification method chooses an appropriate order 
and corrects the ensemble for the autoregressive structure by performing an AR- 
modified bias correction. In cases where there is no substantial autocorrelation, 
our method performs a simple bias correction based on the past Xj values of the 
forecast errors. 


4.2. Combination of predictive distributions. By proceeding along the same 
lines as in Section [3j we set up the local AR-EMOS predictive distribution based 
on applying the AR-modification to the raw ensemble as in Section 3.4 and in a 


second step obtain the SLP combination of local EMOS and local AR-EMOS as 
described in Section [3751 

We investigated the same grid of 99 values for and c that was considered in 
Section [375] to obtain the optimal SLP parameters with respect to the CRPS. As 
it turns out, the combination with weights w\ = W 2 = 0.5 and spread-adjustment 
c = 0.9 performs best with respect to a minimal CRPS. 

Figure |4] shows the verification rank histogram of the raw ensemble and the PIT 
histograms of the three methods over 3650 days, behaving similar to the PIT his¬ 
tograms computed over all 383 stations in Figure [3] Particularly the raw ensemble 
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Table 5. PIT variance (dispersion) and density forecast root 
mean variance (sharpness) for station Frankfurt a.M. for 3650 ver¬ 
ification days. 



Var(PIT) 

RMV 

Local EMOS 

0.098 

1.50 

Local AR-EMOS 

0.068 

1.82 

SLP Combination ( w\ = 0.5, c = 0.9) 

0.086 

1.54 


and local EMOS exhibit an additional forecast bias. While the raw ensemble tends 
to underestimate the temperatures at Frankfurt a.M., local EMOS tends to overes¬ 
timate it. The hump shape of the local AR-EMOS predictive distribution already 
seen in Figure [3j is even more pronounced when investigating the PIT values of a 
single station. As in Figure [3j our SLP combination is closest to uniformity, al¬ 
though the first bin is more occupied than in the overall PIT histogram of Figure 
[3] These observations are in line with the results of Table [5j showing the variance 
of the PIT values and the root mean variance as a measure of sharpness. Similar 
to the situation presented in the left panel, the variance of the local EMOS PIT 
values is larger than , indicating underdispersion, while the variance of the local 
AR-EMOS PIT values is smaller than , indicating overdispersion. When consid¬ 
ering the PIT values only for the station Frankfurt a.M., the variance of the PIT 
values is even smaller than in the overall case (Table [ 3 ]) , corresponding to the more 
pronounced hump-shape. The variance of the PIT values of the SLP combination 
differs only slightly from A, indicating that the PIT histogram is close to unifor¬ 
mity. The sharpness properties of the predictive distribution directly correspond 
to those in Table [3] While local EMOS is sharpest and local AR-EMOS is least 
sharpest, our SLP combination has a sharpness value between the other two. 

The Diebold-Mariano test statistic introduced in Section 3.5.2 for comparing 
the predictive performance of local EMOS with our derived SLP combination (here 
applied to the CRPS series at the station Frankfurt) is S = 9.4624 for h = 1, 
thereby indicating a highly significant nonequal predictive performance of the two 
methods. 


5. Concluding Remarks 

In this work, we propose a basic AR-modification method that accounts for 
potential autoregressive structures in forecast errors of the raw ensemble. The 
AR-modification is straightforward to compute by standard R functions to fit AR 
processes and can be utilized in the context of ensemble forecasts in different ways. 
It can simply be employed to obtain an adjusted ’’raw” ensemble of forecasts that 
is corrected for autoregressive structures or it can be used to construct different 
types of predictive distributions. 

To this end, our proposed modification is simple and yet effective. In our case 
study we suggest to built an EMOS-like predictive distribution based on the AR- 
adjusted ensemble and in a second step obtain an aggregated predictive distribution 
that comprises of the state-of-the-art EMOS predictive distribution and our AR- 
EMOS variant. As EMOS is a well-established standard postprocessing procedure 
that is easy to compute, our proposed extension can be easily constructed. Further, 
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the approach is neither restricted to a specific postprocessing model nor to temper¬ 
ature forecasts. A modification of the approach allowing to combine the method 
with other standard postprocessing models such as BMA should be straightforward. 
Our AR-modification approach allows for a flexible bias-correction based on fitting 
AR process to the error series. The method identifies cases where a correction for 
autoregressive structures is indicated and performs a simple bias-correction based 
on past values in cases where no substantial autocorrelation is present. In the con¬ 
sidered case study all our derived variants based on the AR-modification approach 
improve on the standard EMOS method. While the improvement of the AR-EMOS 
predictive distribution on the standard EMOS distribution is only small and AR- 
EMOS still lacks calibration, the SLP combined predictive distribution based on 
EMOS and AR-EMOS shows a PIT histogram close to uniformity and the improve¬ 
ment on standard EMOS with respect to the verification scores is highly significant. 

In line with the recently increased interest in multivariate postprocessing mod¬ 
els that yield physically coherent forecasts, it should be of interest to extend our 
method to this field of research. An approach that allows to retain dependence 
structures with low computational cost is the Ensemble Copula Coupling (ECC) 
method introduced by Schefzik et al. (2013). ECC is able to recover temporal, 
spatial and inter-variable dependencies present in the raw ensemble by reordering 
samples from the predictive distributions according to the rank structure of the 
raw ensemble. It is a flexible and computationally efficient method, as one needs 
to compute simply the rank order structure of the raw ensemble. A combination 
e.g. of our SLP predictive distribution with ECC would be straightforward and 
easy to compute. This procedure can account for spatial dependence structures be¬ 
tween the stations not considered by our station-wise approach and it may even be 
able to recover additional temporal dependencies from the raw ensemble, that are 
not explicitly modelled by our AR-approach that only considers the autoregressive 
structure in the errors. 

While ECC is a nonparametric method that can recover different types of mul¬ 
tivariate structures simultaneously, there are several parametric approaches to in¬ 


corporate spatial or inter-variable dependencies. Berrocal et al. (2007) and Kleiber 


et al. (2011) for example propose spatially adaptive extensions of the basic BMA 


method, while Scheuerer and Buermann (20141, Scheuerer and Konig (2014) and 


Feldmann et al. (2015) investigate different ways of extending EMOS to incorporate 


spatial dependencies. There is also an interest to investigate inter-variable depen¬ 
dencies. For example Schuhen et al. |2012) develop a bivariate EMOS model for 
wind vectors and Baran and Moller (2015) a bivariate BMA model for temperature 
and wind speed. Further, Moller et al. (2013) investigate a general multivariate set¬ 
ting that allows to combine arbitrary univariate postprocessing distributions within 
a Gaussian copula framework. 

The development of multivariate postprocessing models is a very active area 
of research, and an extension of our proposed AR-modification that incorporates 
spatial or inter-variable dependencies as well should be highly beneficial. 
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